Grouping business news stories based on salience of named entities

نویسندگان

  • Roman Yangarber
  • Llorenç Escoter
  • Lidia Pivovarova
  • Mian Du
  • Anisia Katinskaia
چکیده

In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of business news stories.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Measuring novelty and redundancy with multiple modalities in cross-lingual broadcast news

News videos from different channels, languages are broadcast everyday, which provide abundant information for users. To effectively search, retrieve, browse and track news stories, news story similarity plays a critical role in assessing the novelty and redundancy among news stories. In this paper, we explore different measures of novelty and redundancy detection for cross-lingual news stories....

متن کامل

Multimedia interaction for the new millennium

Spoken language processing has created value in multiple application areas such as document transcription, data base entry, and command and control. Recently scientists have been focusing on a new class of application that promises on-demand access to multimedia information such as radio and broadcast news. In separate research, augmenting traditional graphical interfaces with additional modali...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

Discovery of Unknown Events From Multi-lingual News

We have proposed a new approach to detect topically-related events from multi-lingual news sources. In particular, we are interested in Chinese and English on-line newswire stories. Three categories of named entities terms, namely, people names, geographical location names, and organization names, together with the story content terms constitute the basis for story representation. The named ent...

متن کامل

Large-Scale Named Entity Disambiguation Based on Wikipedia Data

This paper presents a large-scale system for the recognition and semantic disambiguation of named entities based on information extracted from a large encyclopedic collection and Web search results. It describes in detail the disambiguation paradigm employed and the information extraction process from Wikipedia. Through a process of maximizing the agreement between the contextual information ex...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017